Nature Genetics
○ Springer Science and Business Media LLC
All preprints, ranked by how well they match Nature Genetics's content profile, based on 240 papers previously published here. The average preprint has a 0.33% match score for this journal, so anything above that is already an above-average fit. Older preprints may already have been published elsewhere.
Di Vittori, V.; Bitocchi, E.; Rodriguez, M.; Alseekh, S.; Bellucci, E.; Nanni, L.; Gioia, T.; Marzario, S.; Logozzo, G.; Rossato, M.; De Quattro, C.; Murgia, M. L.; Ferreira, J. J.; Campa, A.; Xu, C.; Fiorani, F.; Sampathkumar, A.; Fröhlich, A.; Attene, G.; Delledonne, M.; Usadel, B.; Fernie, A. R.; Rau, D.; Papa, R.
Show abstract
In legumes, pod shattering occurs when mature pods dehisce along the sutures, and detachment of the valves promotes seed dispersal. In Phaseolus vulgaris, the major locus qPD5.1-Pv for pod indehiscence was identified recently. We developed a BC4/F4 introgression line population and narrowed the major locus down to a 22.5-kb region. Here, gene expression and a parallel histological analysis of dehiscent and indehiscent pods identified an AtMYB26 orthologue as the best candidate for loss of pod shattering, on a genomic region ~11 kb downstream of the highest associated peak. Based on mapping and expression data, we propose early and fine up-regulation of PvMYB26 in dehiscent pods. Detailed histological analysis establishes that pod indehiscence is associated to the lack of a functional abscission layer in the ventral sheath, and that the key anatomical modifications associated with pod shattering in common bean occur early during pod development. We finally propose that loss of pod shattering in legumes resulted from histological convergent evolution and that this is the result of selection at orthologous loci. One-sentence summaryA non-functional abscission layer determines the loss of pod shattering; mapping data, and parallel gene expression and histological analysis support PvMYB26 as the candidate gene for pod indehiscence.
Noyvert, B.; Erzurumluoglu, A. M.; Drichel, D.; Omland, S.; Andlauer, T. F. M.; Mueller, S.; Sennels, L.; Becker, C.; Kantorovich, A.; Bartholdy, B. A.; Braenne, I.; Bolivar-Lopez, J. C.; Mistrellides, C.; Belbin, G. M.; Li, J. H.; Pickrell, J. K.; de Jong, J.; Arora, J.; Hu, Y.; Boehringer Ingelheim - Global Computational Biology and Digital Sciences, ; Wood, C. R.; Kriegl, J. M.; Podduturi, N.; Jensen, J. N.; Stutzki, J.; Ding, Z.
Show abstract
Advancements in long-read sequencing technology have accelerated the study of large structural variants (SVs). We created a curated, publicly available, multi-ancestry SV imputation panel by long-read sequencing 888 samples from the 1000 Genomes Project. This high-quality panel was used to impute SVs in approximately 500,000 UK Biobank participants. We demonstrated the feasibility of conducting genome-wide SV association studies at biobank scale using 32 disease-relevant phenotypes related to respiratory, cardiometabolic and liver diseases, in addition to 1,463 protein levels. This analysis identified thousands of genome-wide significant SV associations, including hundreds of conditionally independent signals, thereby enabling novel biological insights. Focusing on genetic association studies of lung function as an example, we demonstrate the added value of SVs for prioritising causal genes at gene-rich loci compared to traditional GWAS using only short variants. We envision that future post-GWAS gene-prioritisation workflows will incorporate SV analyses using this SV imputation panel and framework.
Maxwell, J.; Mitchell, B. L.; DuHarpur, X.; Pardo, L. M.; Witkam, W. C. A. M.; Dand, N.; Bartels, M.; Betti, M. J.; Boomsma, D. I.; Dong, X.; Gerring, Z.; Finer, S.; Genes & Health Research Team, ; Hagenbeek, F. A.; Hottenga, J. J.; Hripcsak, G.; Huilaja, L.; Hveem, K.; Jacobs, B. M.; Kals, M.; Kaufman-Cook, J.; Kettunen, J.; Khan, A.; Kingo, K.; Kiryluk, K.; Loset, M.; Lunter, G.; Lupton, M. K.; Min, J. L.; Martin, N. G.; Medland, S. E.; Neijzen, D.; Nijsten, T. E. C.; Nikopensius, T.; Olsen, C. M.; Petukhova, L.; Reigo, A.; Renteria, M. E.; Rispoli, R.; Saklatvala, J.; Sliz, E.; Tasanen-Maa
Show abstract
Over 85% of the population experience acne at some point in their lives, with its severity spanning a quantitative spectrum, from mild, transient outbreaks to more persistent, severe forms of the condition. Moderate to severe disease poses a substantial global burden arising from both the physical and psychological impacts of this highly visible condition. The analytical approach taken in this study aimed to address the impact of variation in the dichotomisation of acne case control status, driven by ascertainment and study design, on effect size estimates across independent genetic association studies of acne. Through a fixed intercept meta-regression framework, we combined evidence genome-wide for association with acne across studies in which case-control status had been ascertained in different settings, allowing for different severity threshold definitions. Across a combined sample of 73,997 cases and 1,103,940 controls of European, South Asian and African American ancestry we identify genetic variation at 165 genomic loci that influence acne risk. There is evidence for both shared and ancestry specific components to the genetic susceptibility to acne and for sex differences in the magnitude of effect of risk alleles at three loci. We observe that common genetic variation explains 13.4% of acne heritability on the liability scale. Consistent with the hypothesis that genetic risk primarily operates at the level of individual pilosebaceous units, a polygenic score derived from this case-control study of acne susceptibility is associated with both self-reported and clinically assessed acne severity in adolescence, further strengthening the link between genetic risk and disease severity. Prioritisation of causal genes at the identified acne risk loci, provides genetic validation of the targets of established and emerging acne therapies, including retinoid treatments. The identified acne risk loci are enriched for genes encoding downstream effectors of RXRA signalling, including SOX9 and components of the WNT and p53 pathways. Illustrating that the control of stem cell lineage plasticity and cellular fate are important mechanisms through which genetic variation influences acne susceptibility within the pilosebaceous unit.
Orchard, P.; Blackwell, T. W.; Kachuri, L.; Castaldi, P. J.; Cho, M. H.; Christenson, S. A.; Durda, P.; Gabriel, S.; Hersh, C. P.; Huntsman, S.; Hwang, S.; Joehanes, R.; Johnson, M.; Li, X.; Lin, H.; Liu, C.-T.; Liu, Y.; Mak, A. C. Y.; Manichaikul, A. W.; Paik, D.; Saferali, A.; Smith, J. D.; Taylor, K. D.; Tracy, R. P.; Wang, J.; Wang, M.; Weinstock, J. S.; Weiss, J.; Wheeler, H. E.; Zhou, Y.; Zoellner, S.; Wu, J. C.; Mestroni, L.; Graw, S.; Taylor, M. R. G.; Ortega, V. E.; Johnson, C. W.; Gan, W.; Abecasis, G.; Nickerson, D. A.; Gupta, N.; Ardlie, K.; Woodruff, P. G.; Zheng, Y.; Bowler, R. P
Show abstract
Most genetic variants associated with complex traits and diseases occur in non-coding genomic regions and are hypothesized to regulate gene expression. To understand the genetics underlying gene expression variability, we characterize 14,324 ancestrally diverse RNA-sequencing samples from the NHLBI Trans-Omics for Precision Medicine (TOPMed) program and integrate whole genome sequencing data to perform cis and trans expression and splicing quantitative trait locus (cis-/trans-e/sQTL) analyses in six tissues and cell types, most notably whole blood (N=6,454) and lung (N=1,291). We show this dataset enables greater detection of secondary cis-e/sQTL signals than was achieved in previous studies, and that secondary cis-eQTL and primary trans-eQTL signal discovery is not saturated even though eGene discovery is. Most TOPMed trans-eQTL signals colocalize with cis-e/sQTL signals, suggesting many trans signals are mediated by cis signals. We fine-map European UK BioBank GWAS signals from 164 traits and colocalize the resulting 34,107 fine-mapped GWAS signals with TOPMed e/sQTL signals, finding that of 10,611 GWAS signals with a colocalization, 7,096 GWAS signals colocalize with at least one secondary e/sQTL signal. These results demonstrate that larger e/sQTL analyses will continue to uncover secondary e/sQTL signals, and that these new signals will benefit GWAS interpretation.
Hof, J. J. P.; Ning, C.; Quinn, L.; Speed, D.
Show abstract
Common complex diseases are clinically heterogeneous, yet most genome-wide association studies (GWAS) assume cases are genetically homogeneous. This challenge is compounded in large-scale biobanks, which increasingly combine cases ascertained under different recruitment strategies, raising concerns that heterogeneous case definitions may dilute genetic signal. To address this, we developed StratGWAS, a scalable framework that leverages clinical features of heterogeneity to construct a transformed phenotype that better reflects genetic liability within diseases. StratGWAS stratifies cases using secondary phenotypic information such as age of onset, medication burden, or recruitment definition. StratGWAS then estimates genetic covariance between strata, and derives a transformed phenotype that upweights cases with higher inferred genetic liability. Through simulation studies (N = 100k) and application to the UK Biobank (N = 368k), we show that StratGWAS consistently outperformed standard GWAS methods. Applied to 21 UK Biobank traits, StratGWAS upweighted individuals with earlier disease onset and higher medication burden, yielding respectively 17% and 4% more independent genome-wide significant loci than standard case control GWAS. Applied to depression, StratGWAS upweighted individuals with multiple diagnoses, greater psychiatric comorbidity, or higher self reported depressive symptoms, identifying eight additional independent loci compared to case-control GWAS.
Chen, J.
Show abstract
Glycaemic traits are used to diagnose and monitor type 2 diabetes, and cardiometabolic health. To date, most genetic studies of glycaemic traits have focused on individuals of European ancestry. Here, we aggregated genome-wide association studies in up to 281,416 individuals without diabetes (30% non-European ancestry) with fasting glucose, 2h-glucose post-challenge, glycated haemoglobin, and fasting insulin data. Trans-ancestry and single-ancestry meta-analyses identified 242 loci (99 novel; P<5x10-8), 80% with no significant evidence of between-ancestry heterogeneity. Analyses restricted to European ancestry individuals with equivalent sample size would have led to 24 fewer new loci. Compared to single-ancestry, equivalent sized trans-ancestry fine-mapping reduced the number of estimated variants in 99% credible sets by a median of 37.5%. Genomic feature, gene-expression and gene-set analyses revealed distinct biological signatures for each trait, highlighting different underlying biological pathways. Our results increase understanding of diabetes pathophysiology by use of trans-ancestry studies for improved power and resolution.
Adewuyi, E. O.; Auta, A.; Okoh, O. S.; Selmer, K.; Gervin, K.; Nyholt, D. R.; Pereira, G.
Show abstract
Observational studies associate type 2 diabetes (T2D) with increased dementia risk; however, the specificity of this relationship to Alzheimer's disease (AD) and its biological underpinnings remain unresolved. We apply an integrative cross-omic framework to dissect genetic links between AD and T2D. Genome-wide analyses reveal a modest positive genetic correlation and robust polygenic sign concordance of AD with T2D. High-resolution analyses demonstrate locus-specific heterogeneity, with coexisting positive and predominantly negative correlations, and strong inverse associations at APOE and HLA. Cross-trait GWAS meta-analyses indicate that most genome-wide significant signals reflect trait-specific effects, with only a limited set of variants supported in both AD and T2D. Colocalisation reveals distinct causal variants at most shared loci. Gene-based analyses highlight convergence at functional genes, including PLEKHA1, VKORC1, ACE, and APOE, without implying concordant variant-level effects. Bidirectional Mendelian randomisation (MR) shows no evidence of a causal relationship between AD and T2D in either direction. Summary-data MR prioritises genes whose expression or methylation affects both AD and T2D, mostly with opposing effects. Only PLEKHA1 (eQTL) and CAMTA2 (mQTL) show concordant positive associations. Five genes, GALNT10, HSD3B7, BCKDK, KAT8, and ACE, are supported across both regulatory layers, while numerous signals cluster within a regulatory hotspot at 16p11.2, supporting convergent transcriptional and epigenetic involvement, despite directional divergence. These results refine the AD-T2D relationship; rather than a simple shared-risk model, overlap reflects locus-specific heterogeneity and cross-omic convergence often showing opposing effects on AD versus T2D risk, consistent with antagonistic pleiotropy.
Zheng, H. B.; Doran, B. A.; Kimler, K.; Yu, A.; Tkachev, V.; Niederlova, V.; Cribbin, K.; Fleming, R.; Bratrude, B.; Betz, K.; Cagnin, L.; McGuckin, C.; Keskula, P.; Albanese, A.; Sacta, M.; de Sousa Casal, J.; Taliaferro, F.; Ford, M.; Ambartsumyan, L.; Suskind, D. L.; Lee, D.; Deutsch, G.; Deng, X.; Collen, L. V.; Mitsialis, V.; Snapper, S. B.; Wahbeh, G.; Shalek, A. K.; Ordovas-Montanes, J.; Kean, L. S.
Show abstract
Crohns disease is an inflammatory bowel disease (IBD) commonly treated through anti-TNF blockade. However, most patients still relapse and inevitably progress. Comprehensive single-cell RNA-sequencing (scRNA-seq) atlases have largely sampled patients with established treatment-refractory IBD, limiting our understanding of which cell types, subsets, and states at diagnosis anticipate disease severity and response to treatment. Here, through combining clinical, flow cytometry, histology, and scRNA-seq methods, we profile diagnostic human biopsies from the terminal ileum of treatment-naive pediatric patients with Crohns disease (pediCD; n=14), matched repeat biopsies (pediCD-treated; n=8) and from non-inflamed pediatric controls with functional gastrointestinal disorders (FGID; n=13). To resolve and annotate epithelial, stromal, and immune cell states among the 201,883 baseline single-cell transcriptomes, we develop a principled and unbiased tiered clustering approach, ARBOL. Through flow cytometry and scRNA-seq, we observe that treatment-naive pediCD and FGID have similar broad cell type composition. However, through high-resolution scRNA-seq analysis and microscopy, we identify significant differences in cell subsets and states that arise during pediCD relative to FGID. By closely linking our scRNA-seq analysis with clinical meta-data, we resolve a vector of T cell, innate lymphocyte, myeloid, and epithelial cell states in treatment-naive pediCD (pediCD-TIME) samples which can distinguish patients along the trajectory of disease severity and anti-TNF response. By using ARBOL with integration, we position repeat on-treatment biopsies from our patients between treatment-naive pediCD and on-treatment adult CD. We identify that anti-TNF treatment pushes the pediatric cellular ecosystem towards an adult, more treatment-refractory state. Our study jointly leverages a treatment-naive cohort, high-resolution principled scRNA-seq data analysis, and clinical outcomes to understand which baseline cell states may predict Crohns disease trajectory.
Kerner, G.; Kamitaki, N.; Strober, B.; Price, A. L.
Show abstract
Genome-wide association studies (GWAS) have identified thousands of disease-associated loci, yet their interpretation remains limited by the heterogeneity of underlying biological processes. We propose Joint Pleiotropic and Epigenomic Partitioning (J-PEP), a clustering framework that integrates pleiotropic SNP effects on auxiliary traits and tissue-specific epigenomic data to partition disease-associated loci into biologically distinct clusters. To benchmark J-PEP against existing methods, we introduce a metric--Pleiotropic and Epigenomic Prediction Accuracy (PEPA)--that evaluates how well the clusters predict SNP-to-trait and SNP-to-tissue associations using off-chromosome data, avoiding overfitting. Applying J-PEP to GWAS summary statistics for 165 diseases/traits (average N=290K), we attained 16-30% higher PEPA than pleiotropic or epigenomic partitioning approaches with larger improvements for well-powered traits, consistent with simulations; these gains arise from J-PEPs tendency to upweight correlated structure--signals present in both auxiliary trait and tissue data--thereby emphasizing shared components. For type 2 diabetes (T2D), J-PEP identified clusters refining canonical pathological processes while revealing underexplored immune and developmental signals. For hypertension (HTN), J-PEP identified stromal and adrenal-endocrine processes that were not identified in prior analyses. For neutrophil count, J-PEP identified hematopoietic, hepatic-inflammatory, and neuroimmune processes, expanding biological interpretation beyond classical immune regulation. Notably, integrating single-cell chromatin accessibility data refined bulk-based clusters, enhancing cell-type resolution and specificity. For T2D, single-cell data refined a bulk endocrine cluster to pancreatic islet {beta}-cells, consistent with established {beta}-cell dysfunction in insulin deficiency; for HTN, single-cell data refined a bulk endocrine cluster to adrenal cortex cells, consistent with a GO enrichment for neutrophil-mediated inflammation that implicates feedback between aldosterone production in the adrenal gland and local immune signaling. In conclusion, J-PEP provides a principled framework for partitioning GWAS loci into interpretable, tissue-informed clusters that provide biological insights on complex disease.
Mountjoy, E.; Schmidt, E. M.; Carmona, M.; Peat, G.; Miranda, A.; Fumis, L.; Hayhurst, J.; Buniello, A.; Schwartzentruber, J.; Karim, M. A.; Wright, D.; Hercules, A.; Papa, E.; Fauman, E.; Barrett, J. C.; Todd, J. A.; Ochoa, D.; Dunham, I.; Ghoussaini, M.
Show abstract
Genome-wide association studies (GWAS) have identified many variants robustly associated with complex traits but identifying the gene(s) mediating such associations is a major challenge. Here we present an open resource that provides systematic fine-mapping and protein-coding gene prioritization across 133,441 published human GWAS loci. We integrate diverse data sources, including genetics (from GWAS Catalog and UK Biobank) as well as transcriptomic, proteomic and epigenomic data across many tissues and cell types. We also provide systematic disease-disease and disease-molecular trait colocalization results across 92 cell types and tissues and identify 729 loci fine-mapped to a single coding causal variant and colocalized with a single gene. We trained a machine learning model using the fine mapped genetics and functional genomics data using 445 gold standard curated GWAS loci to distinguish causal genes from background genes at the same loci, outperforming a naive distance based model. Genes prioritized by our model are enriched for known approved drug targets (OR = 8.1, 95% CI: [5.7, 11.5]). These results will be regularly updated and are publicly available through a web portal, Open Targets Genetics (OTG, http://genetics.opentargets.org), enabling users to easily prioritize genes at disease-associated loci and assess their potential as drug targets.
Karczewski, K. J.; Gupta, R.; Kanai, M.; Lu, W.; Tsuo, K.; Wang, Y.; Walters, R. K.; Turley, P.; Callier, S.; Baya, N.; Palmer, D. S.; Goldstein, J. I.; Sarma, G.; Solomonson, M.; Cheng, N.; Bryant, S.; Churchhouse, C.; Cusick, C. M.; Poterba, T.; Compitello, J.; King, D.; Zhou, W.; Seed, C.; Finucane, H. K.; Daly, M. J.; Neale, B. M.; Atkinson, E. G.; Martin, A. R.
Show abstract
Large biobanks, such as the UK Biobank (UKB), enable massive phenome by genome-wide association studies that elucidate genetic etiology of complex traits. However, individuals from diverse genetic ancestry groups are often excluded from association analyses due to concerns about population structure introducing false positive associations. Here, we generate mixed model associations and meta-analyses across genetic ancestry groups, inclusive of a larger fraction of the UKB than previous efforts, to produce freely-available summary statistics for 7,266 traits. We build a quality control and analysis framework informed by genetic architecture. Overall, we identify 14,676 significant loci (p < 5 x 10-8) in the meta-analysis that were not found in the EUR genetic ancestry group alone, including novel associations for example between CAMK2D and triglycerides. We also highlight associations from ancestry-enriched variation, including a known pleiotropic missense variant in G6PD associated with several biomarker traits. We release these results publicly alongside FAQs that describe caveats for interpretation of results, enhancing available resources for interpretation of risk variants across diverse populations.
Shringarpure, S. S.; Wang, W.; Jiang, Y.; Acevedo, A.; Dhamija, D.; Cameron, B.; Jubb, A.; Yue, P.; The 23andMe Research Team, ; Sarov-Blat, L.; Gentleman, R.; Auton, A.
Show abstract
A key challenge in the study of rare disease genetics is assembling large case cohorts for well-powered studies. We demonstrate the use of self-reported diagnosis data to study rare diseases at scale. We performed genome-wide association studies (GWAS) for 33 rare diseases using self-reported diagnosis phenotypes and re-discovered 29 known associations to validate our approach. In addition, we performed the first GWAS for Duane retraction syndrome, vestibular schwannoma and spontaneous pneumothorax, and report novel genome-wide significant associations for these diseases. We replicated these novel associations in non-European populations within the 23andMe, Inc. cohort as well as in the UK Biobank cohort. We also show that mixed model analyses including all ethnicities and related samples increase the power for finding associations in rare diseases. Our results, based on analysis of 19,084 rare disease cases for 33 diseases from 7 populations, show that large-scale online collection of self-reported data is a viable method for discovery and replication of genetic associations for rare diseases. This approach, which is complementary to sequencing-based approaches, will enable the discovery of more novel genetic associations for increasingly rare diseases across multiple ancestries and shed more light on the genetic architecture of rare diseases.
Bherer, C.; Grenier, J.-C.; Pelletier, J.; Boucher, G.; Gagnon, G.; Goyette, P.; Ashton-Beaucage, D.; Stevens, C.; Battat, R.; Bitton, A.; Campeau, P.; Laprise, C.; Huang, H.; Daly, M. J.; Taliun, D.; Hussin, J. G.; Mooser, V.; Rioux, J. D.
Show abstract
1The genetic features of founder populations with recent bottlenecks, causing some deleterious variants to rise to higher frequencies, can enhance the power of rare variant association studies. French Canadians from Quebec represent a recent founder population with a particular disease heritage comprising more than 30 prevalent Mendelian conditions. Here, we characterize coding variation in this founder population using exome sequencing data from 2,820 French-Canadian participants - patients with inflammatory bowel diseases (IBD), parents and controls from the Quebec IBD cohort. We find that 18% of rare coding variants are 10-100 times more frequent than in non-Finnish Europeans (NFE). A total of 4,133 missense and loss-of-function variants were significantly enriched with a median 28-fold enrichment, revealing the potential for genotype-phenotype associations in this population. We describe significantly enriched pathogenic variants, including those known to account for the increased prevalence of rare diseases in FC compared to other European descent populations, such as Agenesis of corpus callosum and peripheral neuropathy (SLC12A6) and Leigh Syndrome French Canadian type (LRPPRC). Finally, we investigate whether rare protein-coding variants, enriched in French Canadians by the founder effect, contribute to the risk of IBD using trio and case/control cohorts. In addition to replicating associations in NOD2 and IL23R, we identified new candidate association signals, including enriched variants in SLC35E3, and ARSA. Our findings show that, even in well-characterized founder populations like the French Canadians, there remains untapped potential for genetic discovery, revealing both rare and complex disease risk factors through enriched coding variation.
Cromie, G.; Lo, R.; Morgan, T. S.; Clark, A.; Ashmead, J.; Timour, M. S.; Sirr, A.; Akey, J. M.; Dudley, A. M.
Show abstract
The budding yeast Saccharomyces cerevisiae is a remarkably adaptable organism that thrives in diverse environments. Global sequencing of natural isolates has revealed extensive genetic diversity within the species. Here, we describe the construction and characterization of CYClones (Collaborative Yeast Cross clones), a library of 11,392 segregants generated from a multiparent funnel cross of eight genetically diverse parental strains. To enable the genetic dissection of complex traits, we imputed whole-genome sequences for all segregants and show that CYClones captures a substantial fraction of the global genetic diversity of S. cerevisiae. Haplotype representation is well maintained, with each parental haplotype present at >5% frequency across >95% of the genome. Simulations demonstrate that CYClones has [≥]95% power to detect variants with heritability as low as 0.36%, with mapping resolution often finer than the length of a single gene. In summary, CYClones is a powerful community resource for dissecting the genetic architecture of complex and quantitative traits, uncovering context-dependent mutational effects, and identifying causal variants underlying phenotypic diversity.
Kim, A.; Zhang, Z.; Legros, C.; Lu, Z.; de Smith, A.; Moore, J.; Mancuso, N.; Gazal, S.
Show abstract
The SNP-heritability of human diseases is extremely enriched in candidate regulatory elements (cREs) from disease-relevant cell types. Critical next steps are to understand whether these enrichments are driven by multiple causal cell types and whether individual variants impact disease risk via a single or multiple of cell types. Here, we propose CT-FM and CT-FM-SNP, 2 methods accounting for cREs shared across cell types to identify independent sets of causal cell types for a trait and its candidate causal variants, respectively. We applied CT-FM to 63 GWAS summary statistics (average N = 417K) using 924 cRE annotations, primarily from ENCODE4. CT-FM inferred 79 sets of causal cell types, with corresponding SNP-annotations explaining 39.0 {+/-} 1.8% of trait SNP-heritability. It identified 14 traits with independent causal cell types, uncovering previously unexplored cellular mechanisms in height, schizophrenia and autoimmune diseases. We applied CT-FM-SNP to 39 UK Biobank traits and predicted high-confidence causal cell types for 3,091 candidate causal non-coding SNPs-trait pairs. Our results suggest that most SNPs affect a phenotype via a single set of cell types, whereas pleiotropic SNPs might target different cell types depending on the phenotype context. Altogether, CT-FM and CT-FM-SNP shed light on how genetic variants act collectively and individually at the cellular level to affect disease risk.
Herrero Zazo, M.; Fitzgerald, T. W.; Banasik, K.; Louloudis, I.; Vassos, E.; Colon-Ruiz, C.; Segura-Bedmar, I.; Kessing, L. V.; Ostrowski, S. R.; Pedersen, O. B.; Schork, A.; Sorensen, E.; Ullum, H.; Werge, T.; Bruun, M. T.; Christoffersen, L. A.; Didriksen, M.; Erikstrup, C.; Aagaard, B.; Mikkelsen, C.; DBDS Genomic Consortium, ; Lewis, C.; Brunak, S.; Birney, E.
Show abstract
Major depressive disorder is a complex condition with diverse presentations and polygenic underpinnings. Leveraging large biobanks linked to primary care prescription data, we developed a data-driven approach based on antidepressant prescription trajectories for patient stratification and novel phenotype identification. We extracted quantitative prescription trajectories for 56,951 UK Biobank (UKB) and 64,609 Danish National Biobank (CHB+DBDS) individuals. Using Hidden Markov Models and K-means clustering, we identified five and six patient clusters, respectively. Multinomial logistic regression and non-parametric association tests, using clinical information, enabled patient group characterization. We consistently identified three common patient groups across cohorts: first, a majority group of individuals with mild to moderate depression; second, those with severe mental illness (i.e., a group with a higher likelihood of psychiatric diagnoses, such as bipolar depression, with odds ratios: ORUKB = 1.87 [95% CI = 1.48, 2.35], p = 2.7e-6; ORCHB+DBDS = 1.69 [95% CI = 1.41, 2.02], p = 2.3e-7); and third, patients with less severe forms of depression or receiving treatment for conditions other than depression (i.e., a group with a lower likelihood of depression diagnosis: ORUKB = 0.80 [95% CI = 0.74, 0.85], p = 3e-10; ORCHB+DBDS = 0.77 [95% CI = 0.73, 0.82], p < 1e-10). Genome-wide association studies (GWAS) revealed 14 significant loci, including USP4 and BCHE on chromosome 3, as well as a locus associated with the drug metabolising enzyme CYP2D6. These findings, and the reproducibility across cohorts, demonstrate the power of unsupervised phenotyping from primary care prescriptions for patient stratification and pharmacogenetics research.
Boltz, T.; Bot, M.; Lapinska, S.; Schwarz, T.; Hou, K.; Garske, K. M.; Freund, M. K.; Bearden, C. E.; Macaya, G.; Lopez-Jaramillo, C.; Freimer, N. B.; Boks, M. P.; Kahn, R. S.; Pasaniuc, B.; Ophoff, R. A.
Show abstract
Quantitative Trait Locus (QTL) analysis of molecular data has identified genetic variants associated with traits such as gene expression, and colocalization of these functional QTL with GWAS risk loci has offered insights into the genetic basis of human disease. We employed gene expression (RNA-seq) and chromatin accessibility (ATAC-seq) obtained from human primary fibroblasts to investigate quantitative trait loci (QTLs) in cohorts ascertained for bipolar disorder of European (n=150) and Latin American (n=96) ancestries. Leveraging data from three countries of origin (The Netherlands, Colombia, Costa Rica) within our cohort, we characterized differences among individuals at the SNP, gene, and accessible-chromatin levels to compute ancestry-specific expression (e)QTLs and chromatin-accessibility (ca)QTLs. Across ancestries, we observed R{superscript 2} [≥] 0.93 for eQTL effect sizes and R{superscript 2} [≥] 0.95 for caQTLs, indicating a high degree of concordance. Integrating chromatin data with expression and genotype information enabled precise fine-mapping of eQTLs, yielding 203 high-confidence (posterior probability > 90 %) regulatory pathways. In downstream analyses, transcriptome-wide (TWAS) and chromatin-wide (CWAS) association studies with brain- and skin-related GWAS identified 36 TWAS-significant genes and 77 CWAS-significant open chromatin regions. These findings underscore the shared genetic regulatory mechanisms across European and Latin American ancestries, while demonstrating that ancestry-specific reference panels enhance the accuracy of TWAS and CWAS in diverse populations.
Liu, Z.; Fu, B.; Jeong, M.; Anand, P.; Anand, A.; Jang, S.-K.; Gorla, A.; Zhu, J.; Pajukanta, P.; Palamara, P. F.; Zaitlen, N.; Border, R.; Sankararaman, S.
Show abstract
Whole-exome sequencing (WES) enables high-resolution interrogation of the contribution of rare coding variants to complex trait variation. However, existing methods for heritability estimation attributed to rare-coding variants are often limited by the effects of linkage disequilibrium (LD) and by the sparse nature of rare variant data. We introduce FLEX (Fast, LD-aware Estimation of eXome-wide and gene-level heritability), a scalable and flexible framework for estimating and partitioning heritability across genes or sets of genes using WES data. FLEX integrates all coding variants- from common to ultra-rare - within a unifled model and corrects for LD-induced effects to improve the accuracy of heritability estimates. In addition, FLEX supports both individual-level and summary statistic data and is computationally efflcient for biobank-scale datasets. Through extensive simulations, we show that FLEX is well-calibrated while providing accurate heritability estimates. We applied FLEX to WES data across N = 153, 351 unrelated European ancestry individuals and 20 quantitative traits in the UK Biobank. We identifled 64 gene-trait pairs with signiflcant gene-level heritability (p < 0.05/18, 624 accounting for the number of protein-coding genes tested), among which rare coding variants explained 38% of gene-level heritability, on average. Compared to heritability estimates from genome-wide imputed SNPs, incorporation of rare and ultra-rare coding variants led to a 24.8% increase in heritability on average, while effect sizes at rare and ultra-rare variants are substantially larger ({approx} 18x on average). Partitioning across variant effect annotations, we flnd that predicted loss-of-function variants had stronger individual effects than missense variants (24% on average) while missense variants accounted for a greater share of rare coding heritability. Together, FLEX provides an adaptable and accurate approach for quantifying gene-level heritability, advancing our understanding of the genetic architecture of complex traits, and facilitating the discovery of trait-relevant genes.
Huang, X.; Wang, Y.; Zhao, Q.; Gao, Z.
Show abstract
GWAS increasingly reveal shared genetic influences across neurodevelopmental, psychiatric, and neurodegenerative traits. However, cross-trait genetic covariance derived from GWAS summary statistics can be inflated by sample overlap and other structured background effects, obscuring higher-order genetic organization. We extend PathGPS, a recently developed statistical method that estimates an adjusted genetic covariance by subtracting a background covariance learned from weakly associated variants, and then extracts reproducible low-rank structure using rotation and bootstrap aggregation. When applying to 15 phenotypes related to neurodevelopmental and neurodegenerative disorders, the adjusted analysis yields four stable clusters with an interpretable topology. Adjusting for background covariance, which appears to be related to traumatic life experiences, sharpens the cluster boundaries and substantially shifts the clustering result for post-traumatic syndrome disorder. Simulations with controlled overlap and structured background covariance show that PathGPS has improved factor recovery relative to substantially shifts the clustering result for post-traumatic syndrome disorder.
Ebrahimi, E.; Sangphukieo, A.; Park, H. A.; Gaborieau, V.; Ferreiro-Iglesias, A.; Diergaarde, B.; Ahrens, W.; Alemany, L.; Arantes, L. M.; Betka, J.; Bratman, S. V.; Canova, C.; Conlon, M. S.; Conway, D. I.; Cuello, M.; Curado, M. P.; de Carvalho, A. C.; de Oliviera, J. C.; Gormley, M.; Hadji, M.; Hargreaves, S.; Healy, C. M.; Holcatova, I.; Hung, R. J.; Kowalski, L. P.; Lagiou, P.; Lagiou, A.; Liu, G.; Macfarlane, G. J.; Olshan, A. F.; Perdomo, S.; Pinto, L. F.; Podesta, J. R. V.; Polesel, J.; Pring, M.; Rashidian, H.; Gama, R. R.; Richiardi, L.; Robinson, M.; Rodriguez-Urrego, P. A.; Santi,
Show abstract
In this multi-ancestry genome-wide association study (GWAS) and fine mapping study of head and neck squamous cell carcinoma (HNSCC) subsites, we analysed 19,073 cases and 38,857 controls and identified 29 independent novel loci. We provide robust evidence that a 3 UTR variant in TP53 (rs78378222, T>G) confers a 40% reduction in odds of developing overall HNSCC. We further examine the gene-environment relationship of BRCA2 and ADH1B variants demonstrating their effects act through both smoking and alcohol use. Through analyses focused on the human leukocyte antigen (HLA) region, we highlight that although human papilloma virus (HPV)(+) oropharyngeal cancer (OPC), HPV(-) OPC and oral cavity cancer (OC) all show GWAS signal at 6p21, each subsite has distinct associations at the variant, amino acid, and 4-digit allele level. We also defined the specific amino acid changes underlying the well-known DRB1*13:01-DQA1*01:03-DQB1*06:03 protective haplotype for HPV(+) OPC. We show greater heritability of HPV(+) OPC compared to other subsites, likely to be explained by HLA effects. These findings advance our understanding of the genetic architecture of head and neck squamous cell carcinoma, providing important insights into the role of genetic variation across ancestries, tumor subsites, and gene-environment interactions.